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Abstract — A new generalized multilinear regression model, termed the Higher-Order Partial Least Squares (HOPLS), is introduced 
with the aim to predict a tensor (multiway array) Y from a tensor X through projecting the data onto the latent space and performing 
regression on the corresponding latent variables. HOPLS differs substantially from other regression models in that it explains the data 
by a sum of orthogonal Tucker tensors, while the number of orthogonal loadings serves as a parameter to control model complexity 
and prevent overfitting. The low dimensional latent space is optimized sequentially via a deflation operation, yielding the best joint 
subspace approximation for both X and Y. Instead of decomposing X and Y individually, higher order singular value decomposition 
on a newly defined generalized cross-covariance tensor is employed to optimize the orthogonal loadings. A systematic comparison 
on both synthetic data and real-world decoding of 3D movement trajectories from electrocorticogram (ECoG) signals demonstrate 
the advantages of HOPLS over the existing methods in terms of better predictive ability, suitability to handle small sample sizes, and 
robustness to noise. 
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THE Partial Least Squares (PLS) is a well-established 
framework for estimation, regression and classifica- 
tion, whose objective is to predict a set of dependent 
variables (responses) from a set of independent variables 
(predictors) through the extraction of a small number of 
latent variables. One member of the PLS family is Partial 
Least Squares Regression (PLSR) - a multivariate method 
which, in contrast to Multiple Linear Regression (MLR) 
and Principal Component Regression (PCR), is proven to 
be particularly suited to highly collinear data (I), (2). In 
order to predict response variables Y from independent 
variables X, PLS finds a set of latent variables (also 
called latent vectors, score vectors or components) by 
projecting both X and Y onto a new subspace, while 
at the same time maximizing the pairwise covariance 
between the latent variables of X and Y. A standard 
way to optimize the model parameters is the Non- 
linear Iterative Partial Least Squares (NIPALS) [3|; for an 
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overview of PLS and its applications in neuroimaging 
see pi, (5j, (6). There are many variations of the PLS 
model including the orthogonal projection on latent 
structures (O-PLS) [7|, Biorthogonal PLS (BPLS) (8), re- 
cursive partial least squares (RPLS) (9), nonlinear PLS 
(To) , (TT) . The PLS regression is known to exhibit high 
sensitivity to noise, a problem that can be attributed to 
redundant latent variables (12), whose selection still re- 
mains an open problem (13) . Penalized regression meth- 
ods are also popular for simultaneous variable selection 
and coefficient estimation, which impose e.g., L2 or LI 
constraints on the regression coefficients. Algorithms 
of this kind are Ridge regression and Lasso |14|. The 
recent progress in sensor technology, biomedicine, and 
biochemistry has highlighted the necessity to consider 
multiple data streams as multi-way data structures (15) , 
for which the corresponding analysis methods are very 
naturally based on tensor decompositions (16), (17), (18) . 
Although matricization of a tensor is an alternative way 
to express such data, this would result in the "Large p 
Small n"problem and also make it difficult to interpret 
the results, as the physical meaning and multi-way data 
structures would be lost due to the unfolding operation. 

The TV-way PLS (N-PLS) decomposes the independent 
and dependent data into rank-one tensors, subject to 
maximum pairwise covariance of the latent vectors. This 
promises enhanced stability, resilience to noise, and in- 
tuitive interpretation of the results (19) , (20) . Owing to 
these desirable properties N-PLS has found applications 
in areas ranging from chemometrics (21) , (22) , (23) to 
neuroscience |24|, |25j. A modification of the N-PLS and 
the multi-way covariates regression were studied in (26) , 
(27), [28 1, where the weight vectors yielding the latent 
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variables are optimized by the same strategy as in N- 
PLS, resulting in better fitting to independent data X 
while maintaining no difference in predictive perfor- 
mance. The tensor decomposition used within N-PLS 
is Canonical Decomposition /Parallel Factor Analysis 
(C ANDECOMP / PARAFAC or CP) (29), which makes N- 
PLS inherit both the advantages and limitations of CP 
pQ) . These limitations are related to poor fitness ability, 
computational complexity and slow convergence when 
handling multivariate dependent data and higher order 
(N > 3) independent data, causing N-PLS not to be 
guaranteed to outperform standard PLS (23), (31) . 

In this paper, we propose a new generalized mutilin- 
ear regression model, called Higer-Order Partial Least 
Squares (HOPLS), which makes it possible to predict an 
Mth-order tensor Y (M > 3) (or a particular case of 
two-way matrix Y) from an TVth-order tensor X(iV > 3) 
by projecting tensor X onto a low-dimensional com- 
mon latent subspace. The latent subspaces are optimized 
sequentially through simultaneous rank-(l, L 2 , . . . , L^) 
approximation of X and rank-(l, iv~2, • • • , Km) approxi- 
mation of Y (or rank-one approximation in particular 
case of two-way matrix Y). Owing to the better fitness 
ability of the orthogonal Tucker model as compared to 
CP (16) and the flexibility of the block Tucker model 

(32) , the analysis and simulations show that HOPLS 
proves to be a promising multilinear subspace regression 
framework that provides not only an optimal tradeoff 
between fitness and model complexity but also enhanced 
predictive ability in general. In addition, we develop a 
new strategy to find a closed-form solution by employ- 
ing higher-order singular value decomposition (HOSVD) 

(33) , which makes the computation more efficient than 
the currently used iterative way. 

The article is structured as follows. In Section |5J an 
overview of two-way PLS is presented, and the notation 
and notions related to multi-way data analysis are in- 
troduced. In Section [3] the new multilinear regression 
model is proposed, together with the corresponding 
solutions and algorithms. Extensive simulations on syn- 
thetic data and a real world case study on the fusion of 
behavioral and neural data are presented in Section [2J 
followed by conclusions in Section [5] 

2 Background and Notation 

2.1 Notation and definitions 

iVth-order tensors (multi-way arrays) are denoted by un- 
derlined boldface capital letters, matrices (two-way ar- 
rays) by boldface capital letters, and vectors by boldface 
lower-case letters. The ith entry of a vector x is denoted 
by Xi, element (i, j) of a matrix X is denoted by Xij, 
and element (ii, ^2, • • • , ^iv) of an TVth-order tensor X G 
R hxi 2 x-xi N by . n or (x) ili2 , mmiN . Indices typically 

range from 1 to their capital version, e.g., %n = 1, • • • , ijv- 
The mode-n matricization of a tensor is denoted by 
X( n ) G R / nX/i-/ n _i/„+i."/ JV< jh e nt h factor matrix in a 

sequence is denoted by A^ n ). 



The n-mode product of a tensor X G £ J i x, " x ^ x " ,x/ ^ 
and matrix A G R JnXln is denoted by Y = X x n A G 

j^iix-xi n -ixJ n xi n+1 x-xi N and is defined as: 

yi 1 i 2 ...i n -lj n in + l---iN ~ y ^ X ili2---in---iN a jnin' 0") 

in 

The rank-(R\, R 2 , Rn) Tucker model (34) is a tensor 
decomposition defined and denoted as follows: 

Y « G xi A (1) x 2 A (2) x 3 • • • x N A m 

= [G;AW,...,AW], (2) 

where G G R RiXR2X ~ xRn , (R n < I n ) is the core tensor 
and A^ n ) G I JnXjin are the factor matrices. The last term is 
the simplified notation, introduced in (35) , for the Tucker 
operator. When the factor matrices are orthonormal and 
the core tensor is all-orthogonal this model is called 
HOSVD (33), (33. 

The CP model\YE\, 129) , (36), (37), (38) became promi- 
nent in Chemistry |28l and is defined as a sum of rank- 
one tensors: 

Y«f>a( 1 )oa( 2 )o...o a W (3) 

7-1 

where the symbol V denotes the outer product of vec- 
tors, af n ^ is the column-r vector of matrix A^ n \ and A r 
are scalar s. The CP model can also be represented by 
{2]), under the condition that the core tensor is super- 
diagonal, i.e., Ri = R 2 = • • • = Rn and gi^...^ = if 
in 7^ im for all n ^ m. 

The 1-mode product between G e h 1x/ 2x-x/jv an( j 
t G R /lXl is of size I\ x i 2 x • • • x I N , and is defined as 

(G Xi t)i x i 2mmm i N = gu 2 ...i N ti 1 - (4) 
The inner product of two tensors A,B e ^hxh-.-xiN ^ s 
defined by (A, B) = Y,i li2 ...i N a ^...i N ^i x i 2 ...i Nf and the 
squared Frobenius norm by ||A|||i = (A, A). 

The n-mode cross-covariance between an A/'th-order 
tensor X e ^hx-xi n x-xi N and an Mth-order ten- 
sor Y g £ J i x - x/ nX-xJ M with the same size I n 
on the nth-mode, denoted by COV{ n;n }(X, Y) G 

J^/l X ••• Xl n -i Xl n + 1 X ••• Xl N X Ji X ••• X Jn-1 X J n + 1 X ••• X J M jg 

fined as 

C = COV {n; „ } (X,Y)=<X,Y> { „ ;n}) (5) 

where the symbol < •,• >{ n;n } represents an n-mode 
multiplication between two tensors, and is defined as 

C il,...,in-l,in+l---iN,jl,---,jn-ljn+l---3M ~ 
In 

a 'il,"-,in > -"j*Jv2/jl, — j*nj---jiM' (^) 

in=l 

As a special case, for a matrix Y G R /nXM , the n-mode 
cross-covariance between X and Y simplifies as 

COV {n;1} (X,Y) = Xx n Y T , (7) 

under the assumption that n-mode column vectors of X 
and columns of Y are mean-centered. 
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2.2 Standard PLS (two-way PLS) 
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Fig. 1: The PLS model: data decomposition as a sum of rank- 
one matrices. 



The PLS regression was originally developed for 
econometrics by H. Wold [3|, [39 1 in order to deal with 
collinear predictor variables. The usefulness of PLS in 
chemical applications was illuminated by the group of 
S. Wold Go), (4TJ, after some initial work by Kowalski 
et al. [42|. Currently, the PLS regression is being widely 
applied in chemometrics, sensory evaluation, industrial 
process control, and more recently, in the analysis of 
functional brain imaging data (43), (44), (45), (46), (47) . 

The principle behind PLS is to search for a set of latent 
vectors by performing a simultaneous decomposition of 
X e R lxj and Y e R IxM with the constraint that 
these components explain as much as possible of the 
covariance between X and Y. This can be formulated 
as 



T pi 



UQ 7 



7-1 

R 
r=l 



E, 



F, 



(8) 



(9) 



where T = [ti, t 2 , . . . , t#] e R IxR consists of R extracted 
orthonormal latent variables from X, i.e. T T T = I, and 
U = [ui, u 2 , . . . , ur] g R /XjR are latent variables from 
Y having maximum covariance with T column-wise. 
The matrices P and Q represent loadings and E, F are 
respectively the residuals for X and Y. In order to find 
the first set of components, we need to optimize the two 
sets of weights w, q so as to satisfy 



max [w T X T Yq], s. t. w T w 

{w,q} 



l,q T q=l. 



(10) 



The latent variable then is estimated as t = Xw. Based 
on the assumption of a linear relation u « dt, Y is 
predicted by 

Y « TDQ T , (11) 

where D is a diagonal matrix with d rr = u^t r /t^t r , im- 
plying that the problem boils down to finding common 
latent variables T that explain the variance of both X 
and Y, as illustrated in Fig. [T] 

3 Higher-order PLS (HOPLS) 

For a two-way matrix, the low-rank approximation is 
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(h x 1) 



(7 3 x RL 3 ) 
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(hxR) (R x RL 2 x RL 3 ) ( RL ^h) (/x/ 2 x/ 3 ) 
Raw Data Latent variables Loadings Residuals 

Fig. 2: Schematic diagram of the HOPLS model: approximat- 
ing X as a sum of rank-(l, L2, L3) tensors. Approximation 
for Y follows a similar principle with shared common latent 
components T. 



higher-order tensor, these two criteria lead to completely 
different models (i.e., CP and Tucker model). The TV-way 
PLS (N-PLS), developed by Bro (19), is a straightforward 
multi-way extension of standard PLS based on the CP 
model. Although CP model is the best low-rank approx- 
imation, Tucker model is the best subspace approxima- 
tion, retaining the maximum amount of variation (26) . 
It thus provides better fitness than the CP model except 
in a special case when perfect CP exists, since CP is a 
restricted version of the Tucker model when the core 
tensor is super-diagonal. 

There are two different approaches for extracting the 
latent components: sequential and simultaneous meth- 
ods. A sequential method extracts one latent component 
at a time, deflates the proper tensors and calculates the 
next component from the residuals. In a simultaneous 
method, all components are calculated simultaneously 
by minimizing a certain criterion. In the following, we 
employ a sequential method since it provides better 
performance. 

Consider an A^th-order independent tensor X e 
jjiix-x/jv an( j an Mth-order dependent tensor Y e 
R Jl x ' " x Jm , having the same size on the first mode, i.e., 
h = Ji- Our objective is to find the optimal subspace 
approximation of X and Y, in which the latent vectors 
from X and Y have maximum pairwise covariance. 
Considering a linear relation between the latent vectors, 
the problem boils down to finding the common latent 
subspace which can approximate both X and Y simul- 
taneously. We firstly address the general case of a tensor 
X(iV > 3) and a tensor Y(M > 3). A particular case with 
a tensor X(iV > 3) and a matrix Y(M = 2) is presented 
separately in Sec. 3.3 using a slightly different approach. 



3.1 Proposed model 

Applying Tucker decomposition within a PLS frame- 



equivalent to subspace approximation, however, for a work is not straightforward, and to that end we propose 
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a novel block-wise orthogonal Tucker approach to model 
the data. More specifically, we assume X is decomposed 
as a sum of rank-(l, L 2 , . . . , L/v) Tucker blocks, while Y 
is decomposed as a sum of rank-(l, if 2 , • • • , Km) Tucker 
blocks (see Fig. which can be expressed as 

R 



r=l 



(12) 



Y = Br Xl t r X 2 Q« X 3 • • • XmQ^- 1 ) 



r=l 



where R is the number of latent vectors, t r e R/ 1 



r , v >, N-1 

is the r-th latent vector, |Pr j | i G 



x L n 



and 



'} 

) r 



M-l 



G R J ™+i xA Wi are loading matri- 
ces on mode-n and mode-m respectively, and G r e 

r ixl 2 x-xl n and Q G R ixK 2 x-xx M are core tensors. 

However the Tucker decompositions in ( [l2| ) are not 
unique |16| due to the permutation, rotation, and scaling 
issues. To alleviate this problem, additional constraints 
should be imposed such that the core tensors G r and 
D r are all-orthogonal, a sequence of loading matrices 



are column-wise orthonormal, i.e., P 



= I and 



Qr m ^ T Qr m ^ = I, the latent vector is of length one, i.e. 



||t r || F = 1. Thus, each term in ( [12) is represented as an 
orthogonal Tucker model, implying essentially unique- 
ness as it is subject only to trivial indeterminacies (32) . 

By defining a latent matrix T = [ti, . . . ,t#], mode-n 
loading matrix P 



(n) 



>(n) 

1 ' 



ing matrix Q 



. . . , P^], mode-m load- 

^ = [Qi m \ . . . , Q#^] and core tensor 
G = blockdiagCGi,...,^) e rRxRl 2 x-xrl n / g = 
blockdiag^,...,^) e rRxrk 2 x-xrk m f the H OPLS 
model in ( [12] ) can be rewritten as 



X = G XiTx 2 P 



X3 • • • Xat r + 

(!) „ „ ^r(M-l) 



Y = D Xi T x 2 Q x 3 --- x M Q 



(13) 



where and are residuals after extracting R com- 
ponents. The core tensors G and D have a special block- 
diagonal structure (see Fig. [2| and their elements indicate 
the level of local interactions between the correspond- 
ing latent vectors and loading matrices. Note that the 
tensor decomposition in < [l3] > is similar to the block 
term decomposition discussed in (32), which aims to 
the decomposition of only one tensor. However, HOPLS 
attempts to find the block Tucker decompositions of two 
tensors with block-wise orthogonal constraints, which at 
the same time satisfies a certain criteria related to having 
common latent components on a specific mode. 

Benefiting from the advantages of Tucker decompo- 
sition over the CP model (16) , HOPLS promises to 
approximate data better than N-PLS. Specifically, HO- 
PLS differs substantially from the N-PLS model in the 
sense that extraction of latent components in HOPLS is 
based on subspace approximation rather than on low- 
rank approximation and the size of loading matrices 
is controlled by a hyperparameter, providing a tradeoff 



between fitness and model complexity. Note that HOPLS 
simplifies into N-PLS if we define Vn : {L n } = 1 and 
Vm : {K m } = 1. 

3.2 Optimization criteria and algorithm 

The tensor decompositions in ([l2| consists of two si- 
multaneous optimization problems: (i) approximating X 
and Y by orthogonal Tucker model, (ii) having at the 
same time a common latent component on a specific 
mode. If we apply HOSVD individually on X and Y, 
the best rank-(l, L 2 , . . . , L^) approximation for X and 
the best rank-(l, K 2 , . . . , Km) approximation for Y can 
be obtained while the common latent vector t r cannot be 
ensured. Another way is to find the best approximation 
of X by HOSVD first, subsequently, Y can be approxi- 
mated by a fixed t r . However, this procedure, which re- 
sembles multi-way principal component regression (28) , 
has the drawback that the common latent components 
are not necessarily predictive for Y. 

The optimization of subspace transformation accord- 
ing to ( [T2) will be formulated as a problem of deter- 
mining a set of orthogonormal loadings P^ , Qr m ^ , r — 
1,2, ...,R and latent vectors t r that satisfies a certain 
criterion. Since each term can be optimized sequentially 
with the same criteria based on deflation, in the follow- 
ing, we shall simplify the problem to that of finding 
the first latent vector t and two sequences of loading 
matrices P^ and Q( m ). 

In order to develop a strategy for the simultaneous 
minimization of the Frobenius norm of residuals E and 
F, while keeping a common latent vector t, we first need 
to introduce the following basic results: 

Proposition 3.1. Given a tensor X e R Jl x *** xIn and column 
orthonormal matrices P (n) e R In + lXLn + 1 ,n = 1, . . . , N - 1, 
t e IR 71 with \\t\\ F = 1, the least-squares (LS) solution to 
mine ||X-G x 1 1 x 2 P (1) x 3 • • • x N ^ N -^\\ 2 F is given by 



G = Xxit J 



,pa) r 



X 3 ••• ><N 



p(A/-l)T 



Proof: This result is very well known and is widely 
used in the literature (16) , p3) . A simple proof is 
based on writing the mode-1 matricization of tensor 
X as 

X (1) = tG ( i ) (P^- 1 ) • • • PW) T + E (1) , (14) 

where tensor is the residual and the symbol 
'(g)' denotes the Kronecker product. Since t T t = 1 and 
(p(N-i) $ . . . ® pW) is column orthonormal, the LS 
solution of G(i) with fixed matrices t and P( n ) is 
given by G ( i } = t T X {1) (P( N -^ ®. • ^P^)); writing 
it in a tensor form we obtain the desired result. □ 

Proposition 3.2. Given a fixed tensor X e H IiX '" xIn , the 
following two constrained optimization problems are equiva- 
lent: 

1) miri{p(.) 5t} ||X - G xi t x 2 P« x 3 ■ • • P^" 1 ) \\ 2 F , 
s. t. matrices p( n ) are column orthonormal and \\t\\F = 1. 

2) max {P( „ ))t , ||X X! t T x 2 pW T x 3 • • -x N P^ T \\ 2 Fr 
s. t. matrices p( n ) are column orthonormal and \\t\\ F = 1. 
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The proof is available in |16| (see pp. 477-478). 
Assume that the orthonormal matrices p( n \Q( m \t 



are given, then from Proposition 3.1 the core tensors in 
fL2} can be computed as 



G = X Xi t T x 2 P^ T 
D = Y x 1 t T x 2 Q (1)T 



x 3 • • -Xn 



p(N-l)T 



x 3 



•x M Q 



(M-l)T 



(15) 



According to Proposition 3.2 minimization of ||E||f and 
||F||f under the orthonormality constraint is equivalent 
to maximization of \\G\\f and ||D||f- 

However, taking into account the common latent vec- 
tor t between X and Y, there is no straightforward way 
to maximize \\Q]\f and ||D|| F simultaneously. To this 
end, we propose to maximize a product of norms of two 
core tensors, i.e., max{||G|||, • ||D|||,}. Since the latent 
vector t is determined by p( n ),Q( m ), the first step is 
to optimize the orthonormal loadings, then the common 
latent vectors can be computed by the fixed loadings. 



Proposition 3.3. Let G e R 



lxLo x ••• XLa 



and D e 



R lxK 2 



, then II < G,D > 



{i;i} 



|G|| 2 F 



|D|||. 



Proof: 



<G,D> {1;1} ||^ = ||vec(G)vec T (D)|||, 

= trace (vec(D)vec T (G)vec(G)vec T (D) T ) 



= l|vec(G) 



||vec(D) 



11^- 



(16) 



|G|| 2 F 



where vec(G) G R L2L3 " - Ln is the vectorization of 
the tensor G. □ 
From Proposition |3.3[ observe that to maximize 
is equivalent to maximizing 
II <G,D> {1;1} \\%. According to Il5} and t T t = 1, 
II < G,D >{i ; i} \\ F can be expressecTas 

I [< X, Y > {1;1} ; P^T. • ■ ,P (iV - 1)T , Q (1) ^ • • ,Q (M " 1)T ] ||* • 

\\7) 

Note that this form is quite similar to the optimization 
problem for two-way PLS in ( [To) , where the cross- 
covariance matrix X T Y is replaced by <X,X>{i ; i}- 
In addition, the optimization item becomes the norm 
of a small tensor in contrast to a scalar in ( [To} . Thus, 
if we define <X, Y>| 1;1 }asa mode-1 cross-covariance 
tensor C = COV {1;1} (X, Y) G R^x-x/^x j 2 x-x J M/ t h e 
optimization problem can be finally formulated as 



max 
{p( n ),Q( m )} 



!; P (1)T ,. . . ,P (Ar " 1)T , Q (1)T ,. . ., Q (M " 1)T 

s. t. pM T pM = i Ln+i j QWrqW = i Rm+i ^ (18 ) 

where p( n \n = 1, . . . , N- 1 and Q( m \ra = 1,...,M-1 
are the parameters to optimize. 

Based on Proposition |3.2| and orthogonality of 
p(n) Q(m)^ ^ optimization problem in (18> is equiv- 
alent to find the best subspace approximation of C as 

C « (G^; PW, . . . , P^" 1 ), QW, . . . , Q^" 1 )], (19) 

which can be obtained by rank-(L2, . . . , Lat, if 2, . . • , Km) 
HOSVD on tensor C. Based on Proposition 3.1 the 



Algorithm 1 The Higher-order Partial Least Squares (HOPLS) 
Algorithm for a Tensor X and a Tensor Y 

Input: X G R /lX - x/ ^,Y G R Jl x - x Jm , N > 3, M > 3 
and Ji = J\. 

Number of latent vectors is R and number of loading 



vectors are {L n }% =2 and {K m }^ 



,M - 1. 



Output: {Pr ; };{Qr ; };{G r };{D r };T 
r = 1, . . . , R; n = 1, . . . , N — 1; m = 1, 
Initialization: E x X, F x ^— Y. 
for r = 1 to R do 

if ||E r || F >e and ||F r || F >e then 
C r ^< E r ,F r >{i,i}; 
Rank-(L 2 , . . . , L N: K 2: . . . , K M ) 
Tucker decomposition of C r by HOOI |16| as 
C r ,[GP;p( 1 )...,Pr» ) Q«...,Q^ 1 »] 
t r the first leading left singular vector by 



orthogonal 



SVD 



E. 



X 2 P« T X 3 



G r <— [E r ; t^T, P^ ^ , . . . , P 
D r ^[F r ;t^,Q( 1)T , 
Deflation: 



(w-i)t. 



(!). 



Qr 



(M-I)Tt, 



E r +1 <~ Er 



[G r ;t r ,P 

[Br ' 5 Qr ? • • • ? Q 



(1) 
r 1 

(1) 



■ P^l; 

^(M-l)i,. 



else 

Break; 
end if 
end for 



optimization term in ( [18] ) is equivalent to the norm 
of core tensor G^. To achieve this goal, the higher- 
order orthogonal iteration (HOOI) algorithm (16) , (37) , 
which is known to converge fast, is employed to find 
the parameters P( n ) and Q( m ) by orthogonal Tucker 
decomposition of C. 

Subsequently, based on the estimate of the loadings 
p( n ) and Q( m ), we can now compute the common latent 
vector t. Note that taking into account the asymmetry 
property of the HOPLS framework, we need to estimate 
t from predictors X and to estimate regression coefficient 
D for prediction of responses Y. For a given set of load- 
ing matrices {P( n )}, the latent vector t should explain 
variance of X as much as possible, that is 



arg mm 



X-[G;t,pW ..^P^" 1 )] 



(20) 



which can be easily achieved by choosing t as 
the first leading left singular vector of the matrix 
(X x 2 P^) T x 3 • • • x N p( Ar - 1 ) T ) (1) as used in the HOOI 
algorithm (see (16) , (35)). Thus, the core tensors G and 
D are computed by (15} . 

The above procedure should be carried out repeatedly 
using the deflation operation, until an appropriate num- 
ber of components (i.e., R) are obtained, or the norms 
of residuals are smaller than a certain threshold. The 
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deflation^] is performed by subtracting from X and Y the 
information explained by a rank-(l, L2, . . . , Ln) tensor X 
and a rank-(l, K 2 , • . . , Km) tensor Y, respectively The 
HOPLS algorithm is outlined in Algorithm [TJ 

3.3 The case of the tensor X and matrix Y 

Suppose that we have an TVth-order independent tensor 
XgE JiX "' xJjv (N > 3) and a two-way dependent data 
Y g R/ 1 xM , with the same sample size I\. Since for two- 
way matrix, subspace approximation is equivalent to 
low-rank approximation. HOPLS operates by modeling 
independent data X as a sum of rank-(l, L2, . . . , Ln) 
tensors while dependent data Y is modeled with a sum 
of rank-one matrices as 

R 

Y = ^d r t r q 7 : + F i? , (21) 

r=l 

where ||q r || = 1 and d r is a scalar. 

Proposition 3.4. Let Y G R /xM and q G R M is of length 
one, then t = Yq solves the problem min £ ||Y - tq T |||i. In 
other words, a linear combination of the columns of Y by 
using a weighting vector q of length one has least squares 
properties in terms of approximating Y. 

Proof Since q is given and ||q|| = 1, it is obvious 
that the ordinary least squares solution to solve 
the problem is t = Yq(q T q) _1 , hence, t = Yq. 
If a q with length one is found according to some 
criterion, then automatically tq T with t = Yq gives 
the best fit of Y for that q. □ 
As discussed in the previous section, the problem of 
minimizing ||E|||i with respect to matrices P( n ) and vec- 
tor t E R 7 is equivalent to maximizing the norm of core 
tensor G with an orthonormality constraint. Meanwhile, 
we attempt to find an optimal q with unity length which 
ensures that Yq is linearly correlated with the latent 
vector t, i.e., dt = Yq, then according to Proposition 3.4 



dtq T gives the best fit of Y. Therefore, replacing t by 
«i _1 Yq in the expression for the core tensor G in ( |15| ), 
we can optimize the parameters of X-loading matrices 
p( n ) and Y-loading vector q by maximizing the norm of 
G, which gives the best approximation of both tensor X 
and matrix Y. Finally, the optimization problem of our 
interest can be formulated as: 

max ||X x 1 Y T x 1 q T x 2 P« r x 3 • • • x,v P (7V - 1)T ||^, 

{P(-),q} 

(22) 



S. t. PW T PW = I, ||q|| F = 1. 



where the loadings P( n ) and q are parameters to op- 
timize. This form is similar to ( [18} , but has a different 
cross-covariance tensor C = X x 1 Y T defined between 
a tensor and a matrix, implying that the problem can be 
solved by performing a rank-(l, L2, . . . , Ljy) HOSVD on 

1. Note that the latent vectors are not orthogonal in HOPLS algo- 
rithm, which is related to deflation. The theoretical explanation and 
proof are given in the supplement material. 



C. Subsequently, the core tensor G^ corresponding to 
C can also be computed. 

Next, the latent vector t should be estimated so as to 
best approximate X with given loading matrices P^ n ). 
According to the model for X, if we take its mode-1 
matricizacion, we can write 



tG (1) (P 



(N-l)T , 



'P (1) ) T + E (1) , (23) 



where G (1 ) G JR 1 *^^.. .l n ig gti rj unknown. However, 
the core tensor G (i.e., [X; t T , P« T , . . . , P^" 1 ) 7 ]) and 
the core tensor G (c) (i.e., [C; q T , pW T , . . . , P^" 1 ^]) 
has a linear connection that G^ = dG. Therefore, 
the latent vector t can be estimated in another way 
that is different with the previous approach in Section 



)dV X (1) , p(n) 



3.2 For fixed matrices G(i) = d~ 1 (G_ y 
tKe least square solution for the normalized t, which 
minimizes the squared norm of the residual ||E(i) \\ 2 F , can 
be obtained from 

t <- (Xx 2 P« T X3- • •x Jv p( JV - 1 ) T ) (1) Gjg> + , t <- t/\\t\\ F , 

(24) 

where we used the fact that P^ n ^ are columnwise or- 
thonormal and the symbol denotes Moore-Penrose 
pseudoinverse. With the estimated latent vector t, and 
loadings q, the regression coefficient used to predict Y 
is computed as 

d = t T Yq. (25) 

The procedure for a two-way response matrix is sum- 
marized in Algorithm [2] In this case, HOPLS model 
is also shown to unify both standard PLS and N-PLS 
within the same framework, when the appropriate pa- 
rameters L n are selectecQ 

3.4 Prediction of the Response Variables 

Predictions from the new observations X new are per- 
formed in two steps: projecting the data to the low- 
dimensional latent space based on model parameters 
G r , Pi n \ and predicting the response data based on 
latent vectors T new and model parameters Ql m \ D r . 
For simplicity, we use a matricized form to express the 
prediction procedure as 

w T new Q* T = X^ W WQ* T , (26) 

where W and Q* have R columns, represented by 

w r =(p( A, - 1 )«...®pW)G+ 1 



q; = D r (i)(Q^- 1) ®---®Q^ 1) ) • 



(27) 



In the particular case of a two-way matrix Y, the pre- 
diction is performed by 



Y new _ X™WDQ J 



(28) 



where D is a diagonal matrix whose entries are d r and 
rth column of Q is q r/ r = 1, . . . , R. 

2. Explanation and proof are given in the supplement material. 
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Algorithm 2 Higher-order Partial Least Squares (HOPLS2) 
for a Tensor X and a Matrix Y 

Input: X g R'ix^x-x^tv > 3 and Y G R /lXM 

The Number of latent vectors is R and the number of 
loadings are {L n }% =2 . 

Output: {P^ n) };Q;{G r };D;T;r = 1, . . . , R, n = 2, . . . , N. 

Initialization: E x «— X, Fi ^— Y. 
for r = 1 to R do 

if ||E r || F >e and ||F r || F >e then 
C r <- E r X! F^ ; 

Perform rank-(l, L 2 , • • • , Ljv) HOOI on C r as 
C r « G^ c) X! q r x 2 P^ 1} x 3 • • • x N P^" 1} ; 
t r ^(E r x 2 P< 1} x 3 - • -x* Pl iV - 1) ) (i) (vec T (G^)) + ; 
t r <- t r /||t r || F ; 

G r ^[E r ;t-P^...,P^-n 
Deflation: 

E r+1 ^— E r - [G r ; t r , P^, . . . , Pr^ 
F r _^i «< — F r — <i r t r q r ; 
end if 
end for 



3.5 Properties of HOPLS 

Robustness to noise. An additional constraint of keeping 
the largest {L n }^ =2 loading vectors on each mode is 
imposed in HOPLS, resulting in a flexible model that 
balances the two objectives of fitness and the significance 
of associated latent variables. For instance, a larger L n 
may fit X better but introduces more noise to each 
latent vector. In contrast, N-PLS is more robust due to 
the strong constraint of rank-one tensor structure, while 
lacking good fit to the data. The flexibility of HOPLS 
allows us to adapt the model complexity based on 
the dataset in hands, providing considerable prediction 
ability (see Fig. [IJ [6]). 

"Large p, Small n" problem. This is particularly impor- 
tant when the dimension of independent variables is 
high. In contrast to PLS, the relative low dimension of 
model parameters that need to be optimized in HOPLS. 
For instance, assume that a 3th-order tensor X has the 
dimension of 5 x 10 x 100, i.e., there are 5 samples and 
1000 features. If we apply PLS on Xm with size of 
5 x 1000, there are only five samples available to optimize 
a 1000-dimensional loading vector p, resulting in an 
unreliable estimate of model parameters. In contrast, 
HOPLS allows us to optimize loading vectors, having 
relatively low-dimension, on each mode alternately; thus 
the number of samples is significantly elevated. For 
instance, to optimize 10-dimensional loading vectors on 
the second mode, 500 samples are available, and to op- 
timize the 100-dimensional loading vectors on the third 
mode there are 50 samples. Thus, a more robust estimate 
of low-dimensional loading vectors can be obtained, 



which is also less prone to overfitting and more suitable 
for "Large p, Small n" problem (see Fig. [!}. 

Ease of interpretation. The loading vectors in p( n ) reveal 
new subspace patterns corresponding to the n-mode 
features. However, the loadings from Unfold-PLS are dif- 
ficult to interpret since the data structure is destroyed by 
the unfolding operation and the dimension of loadings 
is relatively high. 

Computation. N-PLS is implemented by combining a 
NIPALS-like algorithm with the CP decomposition. In- 
stead of using an iterative algorithm, HOPLS can find 
the model parameters using a closed-form solution, i.e., 
applying HOSVD on the cross-covariance tensor, result- 
ing in enhanced computational efficiency. 

Due to the flexibility of HOPLS, the tuning parameters 
of L n and K m , controlling the model complexity, need 
to be selected based on calibration data. Similarly to the 
parameter R, the tuning parameters can be chosen by 
cross-validation. For simplicity, two alternative assump- 
tions will been utilized: a) Vn,Vm, L n = K m = A; b) 
L n = f]R n ,K m = r]R m ,0 < rj < 1, i.e., explaining the 
same percentage of the n-mode variance. 

4 Experimental Results 

In the simulations, HOPLS and N-PLS were used to 
model the data in a tensor form whereas PLS was per- 
formed on a mode-1 matricization of the same tensors. 
To quantify the predictability, the index Q 2 was defined 
as Q 2 = 1 - ||Y - Y|||y||Y|||,, where Y denotes the 
prediction of Y using a model created from a calibration 
dataset. Root mean square errors of prediction (RMSEP) 
were also used for evaluation |48| . 

4.1 Synthetic data 

In order to quantitatively benchmark our algorithm 
against the state of the art, an extensive comparative 
exploration has been performed on synthetic datasets to 
evaluate the prediction performance under varying con- 
ditions with respect to data structure, noise levels and 
ratio of variable dimension to sample size. For parameter 
selection, the number of latent vectors (R) and number 
of loadings (L n = K m = A) were chosen based on five- 
fold cross-validation on the calibration dataset. To reduce 
random fluctuations, evaluations were performed over 
50 validation datasets generated repeatedly according to 
the same criteria. 

4.1.1 Datasets with matrix structure 

The independent data X and dependent data Y were 
generated as: 

X = TP T + f E, Y = TQ T + f F, (29) 

where latent variables {t, p, q} ~ Af(0, 1), E, F are Gaus- 
sian noises whose level is controlled by the parameter 
£. Both the calibration and the validation datasets were 
generated according to |29|, with the same loadings 
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P, Q, but a different latent T which follows the same 
distribution Af(0, 1). Subsequently, the datasets were re- 
organized as TVth-order tensors. 

To investigate how the prediction performance is af- 
fected by noise levels and small sample size, {X, Y} e 
R 20xioxio (Case x) and {x,Y} e R 10 * 10 * 10 (Case 2) 

were generated under varying noise levels of lOdB, 5dB, 
OdB and -5dB. In the case 3, {X, Y} e R 10 * 10 * 10 were 
generated with the loadings P, Q drawn from a uniform 
distribution Z7(0, 1). The datasets were generated from 
five latent variables (i.e., T has five columns) for all the 
three cases. 
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Fig. 3: Five-fold cross-validation performance of HOPLS at 
different noise levels versus the number of latent variables (R) 
and loadings (A). The optimal values for these two parameters 
are marked by green squares. 



TABLE 1: The selection of parameters R and A in Case 2. 



SNR 


PLS 


N-PLS 


HOPLS 


SNR 


PLS 


N-PLS 


HOPLS 


R A 


R A 


lOdB 


5 


7 


9 6 


OdB 


3 


5 


5 4 


5dB 


5 


6 


7 5 


-5dB 


3 


1 


3 5 



best performance is obtained by smaller A, implying that 
only few significant loadings on each mode are kept in 
the latent space. This is expected, due to the fact that the 
model complexity is controlled by A to suppress noise. 
The optimal R and A for all three methods at different 
noise levels are shown in Table Q] 
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There are two tuning parameters, i.e., number of latent 
variables R and number of loadings A for HOPLS and 
only one parameter R for PLS and N-PLS, that need to be 
selected appropriately. The number of latent variables R 
is crucial to prediction performance, resulting in under- 
modelling when R was too small while overfitting easily 
when R was too large. The cross-validations were per- 
formed when R and A were varying from 1 to 10 with 
the step length of 1. In order to alleviate the computation 
burden, the procedure was stopped when the perfor- 
mance starts to decrease with increasing A. Fig. [3] shows 
the grid of cross-validation performance of HOPLS in 
Case 2 with the optimal parameters marked by green 
squares. Observe that the optimal A for HOPLS is related 
to the noise levels, and for increasing noise levels, the 



Fig. 4: The prediction performance comparison among HO- 
PLS, N-PLS and PLS at different noise levels for three cases. 
Casel: {X,X} e R 20xl0x10 and {P,Q} - AA(0,1); Case 2: 
{X,Y} e R 10 x 10xl ° and {P, Q} - jV(0, 1); Case 3: {X,X} e 
R 1Oxldxlo and{P,Q}~[/(0,l). 



After the selection the parameters, HOPLS, N-PLS and 
PLS are re-trained on the whole calibration dataset using 
the optimal R and A, and were applied to the validation 
datasets for evaluation. Fig. [4] illustrates the predictive 
performance over 50 validation datasets for the three 
cases at four different noise levels. In Case 1, a relatively 
larger sample size was available, when SNR=10dB, HO- 
PLS achieved a similar prediction performance to PLS 
while outperforming N-PLS. With increasing the noise 
level in both the calibration and validation datasets, 
HOPLS showed a relatively stable performance whereas 
the performance of PLS decreased significantly. The su- 
periority of HOPLS was shown clearly with increasing 
the noise level. In Case 2 where a smaller sample size 
was available, HOPLS exhibited better performance than 
the other two models and the superiority of HOPLS 
was more pronounced at high noise levels, especially 
for SNR<5dB. These results demonstrated that HOPLS 
is more robust to noise in comparison with N-PLS and 
PLS. If we compare Case 1 with Case 2 at different 
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noise levels, the results revealed that the superiority of 
HOPLS over the other two methods was enhanced in 
Case 2, illustrating the advantage of HOPLS in modeling 
datasets with small sample size. Note that N-PLS also 
showed better performance than PLS when SNR<0dB in 
Case 2, demonstrating the advantages of modeling the 
dataset in a tensor form for small sample sizes. In Case 
3, N-PLS showed much better performance as compared 
to its performance in Case 1 and Case 2, implying sensi- 
tivity of N-PLS to data distribution. With the increasing 
noise level, both HOPLS and N-PLS showed enhanced 
predictive abilities over PLS. 

4. 1.2 Datasets with tensor structure 
Note that the datasets generated by |29) do not origi- 
nally possess multi-way data structures although they 
were organized in a tensor form, thus the structure 
information of data was not important for prediction. 
We here assume that HOPLS is more suitable for the 
datasets which originally have multi-way structure, i.e. 
information carried by interaction among each mode are 
useful for our regression problem. In order to verify 
our assumption, the independent data X and dependent 
data Y were generated according to the Tucker model 
that is regarded as a general model for tensors. The latent 
variables t were generated in the same way as described 
in Section 



10dB 



5dB 



4.1.1 



A sequence of loadings p( n ),Q( m ) and 
the core tensors were drawn from A/"(0, 1). For the vali- 
dation dataset, the latent matrix T was generated from 
the same distribution as the calibration dataset, while 
the core tensors and loadings were fixed. Similarly to the 
study in Section 4.1.1 to investigate how the prediction 
performance is affected by noise levels and sample size, 
{X,Y} e R 20 * 10 * 10 (Case 1) and {X,X} e R 10 * 10 * 10 
(Case 2) were generated under noise levels of lOdB, 5dB, 
OdB and -5dB. The datasets for both cases were generated 
from five latent variables. 

TABLE 2: The selection of parameters R and A in Case 2. 
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Fig. 5: Five-fold cross-validation performance of HOPLS at 
different noise levels versus the number of latent variables (R) 
and loadings (A). The optimal values for these two parameters 
are marked by green squares. 
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Fig. 6: The prediction performance comparison among HO- 
PLS, N-PLS and PLS at different noise levels for the two 
cases (i.e., Casel: {X, Y} e R 20xl0x10 and Case 2: {X, Y} e 



R 



) with different sample size. 



The optimal parameters of R and A were shown in 
Table [2] Observe that the optimal R is smaller with the 
increasing noise level for all the three methods. The 
parameter A in HOPLS was also shown to have a similar 
behavior. For more detail, Fig. [5] exhibits the cross- 
validation performance grid of HOPLS with respect to R 
and A. When SNR was lOdB, the optimal A was 4, while 
it were 2, 2 and 1 for 5dB, OdB and -5dB respectively. 
This indicates that the model complexity can be adapted 
to provide a better model when a specific dataset was 
given, demonstrating the flexibility of HOPLS model. 

The prediction performance evaluated over 50 valida- 
tion datasets using HOPLS, N-PLS and PLS with individ- 
ually selected parameters were compared for different 



noise levels and different sample sizes (i.e., two cases). 
As shown in Fig. |6| for both the cases, the prediction 
performance of HOPLS was better than both N-PLS 
and PLS at lOdB, and the discrepancy among them 
was enhanced when SNR changed from lOdB to -5dB. 
The performance of PLS decreased significantly with the 
increasing noise levels while HOPLS and N-PLS showed 
relative robustness to noise. Note that both HOPLS and 
N-PLS outperformed PLS when SNR<5dB, illustrating 
the advantages of tensor-based methods with respect to 
noisy data. Regarding the small sample size problem, 
we found the performances of all the three methods 
were decreased when comparing Case 1 with Case 2. 
Observe that the superiority of HOPLS over N-PLS and 



10 



PLS were enhanced in Case 2 as compared to Case 1 
at all noise levels. A comparison of Fig. [6] and Fig. [4] 
shows that the performances are significantly improved 
when handling the datasets having tensor structure by 
tensor-based methods (e.g., HOPLS and N-PLS). As for 
N-PLS, it outperformed PLS when the datasets have 
tensor structure and in the presence of high noise, but it 
may not perform well when the datasets have no tensor 
structure. By contrast, HOPLS performed well in both 
cases, in particular, it outperformed both N-PLS and PLS 
in critical cases with high noise and small sample size. 

4.1.3 Comparison on matrix response data 

In this simulation, the response data was a two-way 
matrix, thus HOPLS2 algorithm was used to evaluate 



the performance. X e IR 



5x5x5x5 



and Y e R 5x2 were 



generated from a full-rank normal distribution A/"(0, 1), 
which satisfies Y = X( X ) W where W was also generated 
from A/*(0, 1). Fig^A) visualizes the predicted and orig- 
inal data with the red line indicating the ideal prediction. 
Observe that HOPLS was able to predict the validation 
dataset with smaller error than PLS and N-PLS. The 
independent data and dependent data are visualized in 
the latent space as shown in Fig. [7^B). 



HOPLS model 
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Fig. 7: (A) The scatter plot of predicted against actual data 
for each model. (B) Data distribution in the latent vector 
spaces. Each blue point denotes one sample of the independent 
variable, while the red points denote samples of response 
variables. (C) depicts the distribution of the square error of 
prediction on the validation dataset. 



4.2 Decoding of ECoG signals 

In (46J, ECoG-based decoding of 3D hand trajectories 
was demonstrated by means of classical PLS regressiorj^] 
|49) . The movement of monkeys was captured by an 
optical motion capture system (Vicon Motion Systems, 
USA). In all experiments, each monkey wore a custom- 
made jacket with reflective markers for motion capture 

3. The datasets and more detailed description are freely available 
from http://neurotycho.org 



affixed to the left shoulder, elbows, wrists and hand, 
thus the response data was naturally represented as a 
3th-order tensor (i.e., time x 3D positions x markers). 
Although PLS can be applied to predict the trajectories 
corresponding to each marker individually, the structure 
information among four markers would be unused. The 
ECoG data is usually transformed to the time-frequency 
domain in order to extract the discriminative features for 
decoding movement trajectories. Hence, the independent 
data is also naturally represented as a higher-order 
tensor (i.e., channel x time x frequency x samples). 
In this study, the proposed HOPLS regression model 
was applied for decoding movement trajectories based 
on ECoG signals to verify its effectiveness in real-world 
applications. 
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Fig. 8: The scheme for decoding of 3D hand movement 
trajectories from ECoG signals. 

The overall scheme of ECoG decoding is illustrated in 
Fig. [8] Specifically, ECoG signals were preprocessed by a 
band-pass filter with cutoff frequencies at 0.1 and 600Hz 
and a spatial filter with a common average reference. 
Motion marker positions were down-sampled to 20Hz. 
In order to represent features related to the movement 
trajectory from ECoG signals, the Morlet wavelet trans- 
formation at 10 different center frequencies (10-150Hz, 
arranged in a logarithmic scale) was used to obtain the 
time-frequency representation. For each sample point of 
3D trajectories, the most recent one-second ECoG signals 
were used to construct predictors. Finally, a three-order 
tensor of ECoG features X e E /lX32x100 (samples x 
channels x time-frequency) was formed to represent 
independent data. 

We first applied the HOPLS2 algorithm to predict only 
the hand movement trajectory, represented as a matrix 
Y, for comparison with other methods. The ECoG data 
was divided into a calibration dataset (10 minutes) and 
a validation dataset (5 minutes). To select the optimal 
parameters of L n and R, the cross-validation was ap- 
plied on the calibration dataset. Finally, L n = 10 and 
R = 23 were selected for the HOPLS model. Likewise, 
the best values of R for PLS and N-PLS were 19 and 
60, respectively. The X-latent space is visualized in Fig. 
[9jA), where each point represents one sample of inde- 
pendent variables, while the Y-latent space is presented 
in Fig. [9|B), with each point representing one dependent 
sample. Observe that the distributions of these two latent 
variable spaces were quite similar, and the two dominant 
clusters are clearly distinguished. The joint distributions 
between each t r and u r are depicted in Fig. [9|C). Two 
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Fig. 9: Panels (A) and (B) depict data distributions in the X- 
latent space T and Y-latent space U, respectively. (C) presents 
a joint distribution between X- and Y-latent vectors. 



clusters can be observed from the first component which 
might be related to the 'movement 7 and 'non-movement 7 
behaviors. 




gamma band activity in the premotor cortex is associated 
with movement preparation, initiation and maintenance 
(50) 

From Table [3J observe that the improved prediction 
performances were achieved by HOPLS, for all the per- 
formance metrics. In particular, the results from dataset 1 
demonstrated that the improvements by HOPLS over N- 
PLS were 0.03 for the correlation coefficient of X-position, 
0.02 for averaged RMSEP, 0.04 for averaged Q 2 , whereas 
the improvements by HOPLS over PLS were 0.03 for 
the correlation coefficient of X-position, 0.02 for averaged 
RMSEP, and 0.03 for averaged Q 2 . 

Since HOPLS enables us to create a regression 
model between two higher-order tensors, all trajecto- 
ries recorded from shoulder, elbow, wrist and hand 
were contructed as a tensor Y £ R /lX3x4 (samples x 3D 
positions x markers). In order to verify the superiority 
of HOPLS for small sample sizes, we used 100 second 
data for calibration and 100 second data for validation. 
The resolution of time-frequency representations was im- 
proved to provide more detailed features, thus we have a 
4th-order tensor X £ R /lX 32x20x20 (samples x channels x 
time x frequency). The prediction performances from 
HOPLS, N-PLS and PLS are shown in Fig. [IT} illustrating 
the effectiveness of HOPLS when the response data 
originally has tensor structure. 



-1 -0.5 0-1 -0.5 0-1 -0.5 0-1 -0.5 0-1 -0.5 

Time(s) 

Fig. 10: (A) Spatial loadings corresponding to the first 
two latent components. Each row shows 5 significant loading 
vectors. Likewise, (B) depicts time-frequency loadings Pr 2 \ 
with p and 7-band exhibiting significant contribution. 



Another advantage of HOPLS was better physical in- 
terpretation of the model. To investigate how the spatial, 
spectral, and temporal structure of ECoG data were 
used to create the regression model, loading vectors can 
be regarded as a subspace basis in spatial and time- 
frequency domains, as shown in Fig. [lO] With regard 
to time-frequency loadings, the f3- and 7-band activities 
were most significant implying the importance of /3, 7- 
band activities for encoding of movements; the duration 
of /3-band was longer than that of 7-band, which indi- 
cates that hand movements were related to long history 
oscillations of /3-band and short history oscillations of 
7-band. These findings also demonstrated that a high 
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Fig. 11: The prediction performance of 3D trajectories recorded 
from shoulder, elbow, wrist and hand. The optimal R are 16, 
28, 49 for PLS, N-PLS and HOPLS, respectively, and A = 5 for 
HOPLS. 

Time-frequency features of the most recent one-second 
window for each sample are extremely overlapped, re- 
sulting in a lot of information redundancy and high com- 
putational burden. In addition, it is generally not neces- 
sary to predict behaviors with a high time-resolution. 
Hence, an additional analysis has been performed by 
down-sampling motion marker positions at 1Hz, to en- 
sure that non-overlapped features were used in any 
adjacent samples. The cross-validation performance was 
evaluated for all the markers from the ten minute cal- 
ibration dataset and the best performance for PLS of 
Q 2 = 0.19 was obtained using R = 2, for N-PLS it 
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TABLE 3: Comprehensive comparison of the HOPLS, N-PLS and PLS on the prediction of 3D hand trajectories. The numbers 
of latent vector for HOPLS, N-PLS and Unfold-PLS were 23, 60, and 19, respectively. 



, Q 2 (3D hand positions) RMSEP (3D hand positions) Correlation 
oG) 









X 


Y 


z 


Mean 


X 


Y 


z 


Mean 


X 


Y 


z 




HOPLS 


0.25 


0.43 


0.48 


0.61 


0.51 


0.82 


0.70 


0.66 


0.73 


0.67 


0.72 


0.78 


DS1 


N-PLS 


0.33 


0.39 


0.44 


0.59 


0.47 


0.85 


0.73 


0.68 


0.75 


0.64 


0.71 


0.77 




Unfold-PLS 


0.23 


0.39 


0.45 


0.59 


0.48 


0.85 


0.72 


0.68 


0.75 


0.64 


0.72 


0.77 
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U.4Z 


U.3U 




n no 


U.// 


U./z 


\J.OO 


n ic 

\J.OD 


U.54 


U./ 1 


DS2 


N-PLS 


0.33 


0.03 


0.40 


0.51 


0.32 


1.04 


0.78 


0.71 


0.84 


0.32 


0.64 


0.71 




Unfold-PLS 


0.22 


0.05 


0.40 


0.53 


0.32 


1.04 


0.78 


0.70 


0.84 


0.34 


0.63 


0.73 




HOPLS 


0.22 


0.36 


0.39 


0.48 


0.41 


0.74 


0.77 


0.66 


0.73 


0.62 


0.62 


0.69 


DS3 


N-PLS 


0.30 


0.31 


0.37 


0.46 


0.38 


0.77 


0.78 


0.68 


0.74 


0.61 


0.62 


0.68 




Unfold-PLS 


0.21 


0.30 


0.37 


0.46 


0.38 


0.77 


0.79 


0.67 


0.74 


0.61 


0.62 


0.68 




HOPLS 


0.16 


0.16 


0.50 


0.57 


0.41 


1.04 


0.66 


0.62 


0.77 


0.43 


0.71 


0.76 


DS4 


N-PLS 


0.23 


0.12 


0.45 


0.55 


0.37 


1.06 


0.69 


0.67 


0.80 


0.41 


0.70 


0.76 




Unfold-PLS 


0.15 


0.11 


0.46 


0.57 


0.38 


1.07 


0.69 


0.62 


0.79 


0.42 


0.70 


0.76 



was Q 2 = 0.22 obtained by R = 5, and for HOPLS 
it was Q 2 = 0.28 obtained by R = 24, A = 5. The 
prediction performances on the five minute validation 
dataset are shown in Fig. [l2j implying the significant 
improvements obtained by HOPLS over N-PLS and PLS 
for all the four markers. For visualization, Fig. 13 exhibits 
the observed and predicted 3D hand trajectories in the 
150s time window. 
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Fig. 12: The prediction performance of 3D trajectories for 
shoulder, elbow, wrist and hand using non-overlapped ECoG 
features. 



5 Conclusions 

The higher-order partial least squares (HOPLS) has been 
proposed as a generalized multilinear regression model. 
The analysis and simulations have shown that the ad- 
vantages of the proposed model include its robustness 
to noise and enhanced performance for small sample 
sizes. In addition, HOPLS provides an optimal tradeoff 
between fitness and overfitting due to the fact that 
model complexity can be adapted by a hyperparameter. 
The proposed strategy to find a closed-form solution 
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Fig. 13: Visualization of observed trajectories (150s time win- 
dow) and the trajectories predicted by HOPLS, N-PLS and PLS. 



for HOPLS makes computation more efficient than the 
existing algorithms. The results for a real-world applica- 
tion in decoding 3D movement trajectories from ECoG 
signals have also demonstrated that HOPLS would be a 
promising multilinear subspace regression method. 
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